Clustering of Polysemic Words

نویسندگان

  • Laurent Cicurel
  • Stephan Bloehdorn
  • Philipp Cimiano
چکیده

In this paper, we propose an approach for constructing clusters of related terms that may be used for deriving formal conceptual structures in a later stage. In contrast to previous approaches in this direction, we explicitly take into account the fact that words can have different, possibly even unrelated, meanings. To account for such ambiguities in word meaning, we consider two alternative soft clustering techniques, namely Overlapping Pole-Based Clustering (PoBOC) and Clustering by Committees (CBC). These soft clustering algorithms are used to detect different contexts of the clustered words, resulting in possibly more than one cluster membership per word. We report on initial experiments conducted on textual data from the tourism domain.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Synonym Dictionary Improvement through Markov Clustering and Clustering Stability

Abstract. The aim of the work presented here is to clean up a dictionary of synonyms which appeared to be ambiguous, incomplete and inconsistent. The key idea is to use Markov Clustering and Clustering Stability techniques on the network that represents the synonymy relation contained in the dictionary. Each densely connected cluster is considered to correspond to a specific concept, and ambigu...

متن کامل

Clustering Polysemic Subcategorization Frame Distributions Semantically

Previous research has demonstrated the utility of clustering in inducing semantic verb classes from undisambiguated corpus data. We describe a new approach which involves clustering subcategorization frame (SCF) distributions using the Information Bottleneck and nearest neighbour methods. In contrast to previous work, we particularly focus on clustering polysemic verbs. A novel evaluation schem...

متن کامل

Extraction of Translation Equivalents from Parallel Corpora Using Sense-sensitive Contexts

The paper proposes an unsupervised method to extract translation equivalents from parallel corpora. The strategy we use takes into account the context of words. Given a word of the source language and a particular context, we learn its word translation within an equivalent context. We first extract pairs of similar contexts and, then, we compare the similarity between words appearing in each pa...

متن کامل

Lexicalised Systematic Polysemy in WordNet

This paper describes an attempt to gain more insight into the mechanisms that underlie lexicalised systematic polysemy. This phenomenon is interpreted as systematic sense combinations that are valid for more than one word. The hierarchical structure of WordNet is exploited to create a working definition of systematic polysemy and extract polysemic patterns at a level of generalisation that allo...

متن کامل

Visualizing polysemy using LSA and the predication algorithm

Context is a determining factor in language, and plays a decisive role in polysemic words. Several psycholinguistically-motivated algorithms have been proposed to emulate human management of context, under the assumption that the value of a word is evanescent and takes on meaning only in interaction with other structures. The predication algorithm (Kintsch, 2001), for example, uses a vector rep...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006